During this research project, the goal is to find a relationship in the transportation industry regarding emissions. Using a summary and visualization, I would like to provide a detailed conclusion on emissions by understanding the types of gas being emitted over the set time horizon. I want to see if more harmful pollutants are being gradually removed from our atmosphere or not, and which sector of the transportation industry is most responsible. The data was obtained by a report from Climate TRACE, a nonprofit organization uses imagery and created data sets from the website “Data Is Plural”.
Research Question
I am interested in the connection of the trajectory of emissions in the transportation industry to see if new legislation created in the government and clean energy investments have been effective at reducing emissions in the United States. In the United States, the issue of climate change has been hotly contested for years as people argue over the best path moving forward to improve conditions for our society. Citizens have protested new pipelines, demanded politicians to create policies to move away from fossil fuels, and increasing are becoming frustrated with stagnation on action. One of the most highest emitters is the transportation industry. When individuals think of mass polluters, we tend to think of companies like electric, gas, and waste industries. However, we need to think deeper to come to a realization, how do the majority of people get to work, get groceries, mail and or packages being delivered? The transportation industry is the answer, but this leads to another question, “How much emissions per ton are emitted annually and how this dilemma is changing since the push for clean energy beginning in 2015?”
Data Selection
The analysis of transportation emissions is focused in a few different business areas like environmental, transportation and utility industries. Analysts for these types of companies look at the data to determine next steps in their projects whether it is wanting to do their part in reducing emissions, complying with government regulations or showing a positive public message. the analysis of transportation emissions and understanding the trajectory can help us create a cleaner future for the next generation. This information was obtained by Climate Trace by using satellite imagery, Database of Road Transportation Emissions (DARTE), and Average Annual Daily Traffic(AADT) data from the U.S. Highway Performance Monitoring System. Using this type of technology, databases, and machine learning, the model will provide answers on amount of emissions annually, sector of transportation, type of gas within the United States. The goal of the data set is to understand if emissions are decreasing but it is important to understand the severity of gases’ emitted.
In the previous section, I mentioned the most important variables in the data are annual emissions, sector of transportation, and type of gas. I want to see if annual emissions are decreasing or not since 2015 and see if government and corporate actions have been working. Then, the next stage is understanding which gas and or sector of transportation produces most annual emissions within the United States.
Looking at the data set, I realize there are a couple of different problems as most columns are undefined, confusing, or not necessary to this research. Another issue is we need to understand our variables in particular is the type of gases being emitted. The most common sources of gas are Carbon Dioxide (CO2), Methane (CH4), and Nitrous Oxide (N2O). The final type is Carbon Dioxide Equivalent (CO2e) which were other forms of gas converted into a greenhouse gas. I am expecting a significant amount of Carbon Dioxide transportation emissions as the gas is created from burning fossil fuels like oil which our nation relies on almost entirely with a smaller proportion of the remaining gases to fill out our data.
Data Manipulation
To clean the data we will need to install two packages called tidyverse and dplyr.
Warning: package 'lubridate' was built under R version 4.2.2
Loading required package: timechange
Warning: package 'timechange' was built under R version 4.2.2
Attaching package: 'lubridate'
The following objects are masked from 'package:base':
date, intersect, setdiff, union
options(scipen=999)
Using Tidyverse and Dplyr will give us access to edit rows and columns in R by renaming columns as an example. Another important aspect is finding how much rows and columns are in the data set, which we can find using code dim().
dim(covid.data)
Error in eval(expr, envir, enclos): object 'covid.data' not found
Rows = 190, Columns = 10
The result shows 190 10 when submitted, this expression states there are 190 rows and 10 columns of data. In R, one row equals one observation of data. This statement is almost identical to be used to review the columns as one column equals one variable in the data set. Therefore, since there are 10 columns, 10 variables are identified. Now, I want to read my data set.
Rows: 35 Columns: 10
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (4): iso3_country, original_inventory_sector, gas, emissions_quantity_u...
dbl (1): emissions_quantity
lgl (1): temporal_granularity
dttm (4): start_time, end_time, created_date, modified_date
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Rows: 35 Columns: 10
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (4): iso3_country, original_inventory_sector, gas, emissions_quantity_u...
dbl (1): emissions_quantity
lgl (1): temporal_granularity
dttm (4): start_time, end_time, created_date, modified_date
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Rows: 35 Columns: 10
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (4): iso3_country, original_inventory_sector, gas, emissions_quantity_u...
dbl (1): emissions_quantity
lgl (1): temporal_granularity
dttm (4): start_time, end_time, created_date, modified_date
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Rows: 40 Columns: 10
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (5): iso3_country, original_inventory_sector, gas, emissions_quantity_u...
dbl (1): emissions_quantity
dttm (4): start_time, end_time, created_date, modified_date
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Rows: 40 Columns: 10
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (5): iso3_country, original_inventory_sector, gas, emissions_quantity_u...
dbl (1): emissions_quantity
dttm (4): start_time, end_time, created_date, modified_date
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Rows: 40 Columns: 10
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (5): iso3_country, original_inventory_sector, gas, emissions_quantity_u...
dbl (1): emissions_quantity
dttm (4): start_time, end_time, created_date, modified_date
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Joining Data
Opening the downloaded data reveals that I am missing some important information like other sectors of transportation. I would like to create a bivariate graph to include two variables so we need to complete further steps to effectively organize the data and prepare it for visualization. The first step is combining these data sets together using function full_join()
I chose to combine these data sets one at a time in order to view them and make sure all of the information is successfully entered. On the Master File, you will see all of the inventory sectors in transportation together with the values. As this data was calculated with technology, some information not necessary is present, needing to be removed by using function select()
Our main data set now contains all relevant columns to answer our research question. One error on the data set that has not been resolved is that I do not want a date, I only want the year so we will need to mutate the data, using function mutate()
In our data set, the column start_time will only show the year instead of the entire date, making the process of creating a visualization easier. The column names can be updated to make our data set look more clean so we will need to rename the columns. Using the rename() function, we can rename multiple columns in one command.
In our data sets, some of our joined data had statistics for 2022 while some did not. We can not include 2022 as it would give us unrealistic results and would be not accurate due to having insufficient information. We need to filter out all data with the year 2022. I will achieve this by using function filter()
TransportationEmissions_2015_2021 <-filter(TransportationEmissions_2015_2021, Year <'2022')
Data Frame for Total 2015-2021 Transportation Emissions
After improving our data set and being able to easily understand our variables and observations, I am confident that a summary of the data can be created. To summarize the data, we will need to isolate and arrange some variables to a new data frame.
Year Sum_of_Annual_Total_Emissions
Min. :2015 Min. :4480136779
1st Qu.:2016 1st Qu.:5038651667
Median :2018 Median :5191797816
Mean :2018 Mean :5086377127
3rd Qu.:2020 3rd Qu.:5267393335
Max. :2021 Max. :5320615286
Our new data frame shows the total amount of emissions on a yearly basis since 2015. I observed an increase of emissions in 2015 to 2019 due to our economy growing at a rapid pace with the introduction of globalization and more transportation needed to move things. 2020 is the lowest year of emissions with multiple events happened causing a decrease of emissions, a major reason is COVID-19, limiting travel from countries with less people commuting to work and leisure. Another reason for the decrease is the threat of climate change and individuals wanting to be ecologically friendly by demanding clean energy investment to the government. 2018 is the highest year of emissions likely linked with our growing economy and need of cheap transportation.
2021 Transportation Emissions
I need to separate 2021 emissions from all of the other years to determine highest amount of emissions in each sector of transportation to make an effective visualization of one of the deliverables. The next task is creating a new data frame to understand sector of transportation with the highest amount of emissions so I will need to find a sum for each sector.
In our previous graph, we will find that road transportation as the majority of emissions. All other sectors combined would be smaller, this tells us that road transportation is the area, where our focus of reducing emissions should be a priority. Our final data frame required will inform us of the type and amount of gases emitted from 2015 to 2021.
Our data sets states that the majority of gases produced within the Transportation industry are Carbon Dioxide (Co2) or Carbon Dioxide Equivalent gases (Co2E) with a fraction being Methane and Nitrous Oxide.
Now, that all of our data sets have been successfully created, we can begin to create our visualizations.
Data Visualization
To make the correct visualizations, we need to understand the variables that need to be graphed. A discrete variable are values that can be obtained by counting. An example of a discrete is our years column with a value like 2015, which can be counted. A continuous variable measures random values that mean something such as the sector of industry. We are going to use two variables for all of the plots so we will be using bi-variate graphs.
Using this information, I managed to separate my variables into either discrete or continuous in order to find the best visualization. I tried a few different plots to graph the data like a jitter and bar graph but I decided the count graph with a combination of a line graph was best. In the net emissions, both variables are discrete as the year and annual emissions were counted. The count graph shows the values in a precise, clear way and the line graph shows the change of data over time.
After I decided on my two graphs, I used my two discrete variables “Year and Annual Total Emissions” from the net_emissions dataset and created the visualization.
ggplot(net_emissions, aes(x = Year, y = Sum_of_Annual_Total_Emissions)) +geom_line (aes(group=1)) +geom_count(colour ="darkblue", fill =4) +labs(title ="Net Emissions By Year",y ="Total Emissions",x ="Year") +theme(plot.title =element_text(size =26, face ="bold", color ="black"),axis.title.x =element_blank(),axis.title.y =element_blank(), legend.position ="none")
Now the plot show changes of emissions by the transportation industry in 2015 to 2021. Looking at the visualization, there is a significant drop off in emissions in 2020 but increased for the majority of years.
Our next step is creating a plot for the year of 2021 by using our Emissions2021 data frame to determine most emissions per sector. I will be using a column graph as we are using one discrete and continuous variable for analysis.
ggplot(Emissions2021, aes(x = Section_of_Transportation, y = Sum_of_Annual_Total_Emissions)) +geom_col(colour ="darkblue", fill =4) +labs(title ="Transportation Sector Emissions in 2021",y ="Total Emissions",x ="Transportation Sector") +theme(plot.title =element_text(size =22, face ="bold", color ="black"),axis.title.x =element_blank(),axis.title.y =element_blank(), legend.position ="none")
The visualization for the transportation sector states that road transportation industry is most significant emitter with little contributions from the remaining sectors.
The final visualization needed to answer our research question is the amount of emissions by gas from 2015-2021 to understand most prominent gases by transportation industry. I will be using a column graph once again.
ggplot(GasTypeEmissionTotal, aes(x = Type_of_Gas, y = Sum_of_Annual_Total_Emissions)) +geom_col(colour ="darkblue", fill =4) +labs(title ="Gas Emission Total",y ="Total Emissions",x ="Type of Gas") +theme(plot.title =element_text(size =26, face ="bold", color ="black"),axis.title.x =element_blank(),axis.title.y =element_blank(), legend.position ="none")
The results show almost all emissions are caused by Carbon Dioxide Gases or Carbon Dioxide Equivalent gases.
While completing these visualizations, I realized some interesting results. In my first plot, it was slowly increasing until the peak in 2018 with a small decrease in 2019. The major information discovered is a sharp decline in emissions in 2020. Few reasons can be involved as COVID-19 began to affect the global economy by making people shelter in place with essential travel happening in the US such as getting groceries, going to work if you are an essential worker as an example for the decrease. Popularity in clean transportation and activism in climate change has caused people to change their lifestyle choices like not owning a car, switching to a EV, or public transportation reduces on-road emissions as well. As the United States emerged from the pandemic in 2021, a few factors caused an increase but not to the peak of 2018. People began to get vaccinated and feel more comfortable in public meaning people back into the office, participating in leisure activities, resulting in higher emission levels. However, due to the switch over to cleaner transportation and people continuing virtually slow down the increase.
The Sector of Transportation data set confirms that the majority of emissions caused by sector is road transportation. This is not a surprise to me as most people in the United States will own a car, drive to one destination from another, it is the most accessible mode of transportation in rural and urban areas. Jobs need cars to function as employees need vehicles to get to work, complete work, and leave in most cases. I expected domestic aviation to have more emissions as people begin to fly commercially for business, family or leisure as states open for travelers, but was surprised with a value comparable to international aviation which is regulated for COVID prone countries.
My final analysis states that the majority of gas emitted in the Transportation industry is Carbon Dioxide or Equivalent gases. I was surprised on how much little methane and nitrous oxide emissions were produced due to the aviation industry along with the production of the equipment. I did expect a substantial amount of Carbon Dioxide emissions as vehicles are the lifeline to our economy, and the country is going through electric vehicle transformation at a slow rate. In return, we need to utilize gas powered vehicles which cause higher carbon dioxide emissions. I expect this number to decrease as cleaner forms of transformation and power begin to become more prevalent.
Reflection
This was my first research project in Data Science that I have ever done as I decided on a career change when I enter the Data Analytics and Computational Social Science program. I began the project initially with little understanding in R, I prepared with a mini course in SQL but was unsure how that knowledge would transfer and be relevant to the project. Over time, I became more confident after doing the homework, trial and error, and asking for help from tutors. At first, I was planning to do a project based on Electric Vehicle Sales within Washington. However, I was unable to find a sufficient data set to join my data together regarding vehicle pricing, average lifespan of vehicle to determine if an EV was a good purchase for an average American so I decided to change my approach and look at overall emissions in the transportation industry. As someone who works in the utility industry and closely with our Fleet department. I wanted to see any correlation between greenhouse emissions and road transportation along with a comparison of other sectors. It would help me visualize, what I want to work to achieve in my personal work role. Our company goal is to become fully electrify our Fleet by 2050 and bring our carbon footprint to zero. I wanted to see as a country if our emissions were decreasing, which sector is most responsible and what type of gas is most responsible.
My most challenging problem was removing the data in scientific form into regular form. I was confused for a portion of time on why my numbers were so inaccurate, an example of this is the total methane emissions had a much higher number than carbon dioxide. Originally, the problem was related to how the data was initially downloaded and uploaded into R. Other data sets, I checked did not have this issue. I also knew that my data was false because vehicles run on gas which emits carbon dioxide while methane and nitrous oxide is more common in different industries like agriculture. I was unsure if the code was typed wrong, or if my charts were incomplete. I second guessed myself and made it seem a lot harder. Eventually, I needed help and I realized that it was the formatting in R that created the issue. My data set and code was correct with no errors. I typed in the option function, and correctly formatted my data for visualization.
I did not experience much trouble with the visualizations which surprised me. I completed a visualization of the United States map for a different project and I could not find an answer to the function giving me errors for a few hours. Making the visualization in my different class, I learned how to modify the visualization to make it clean, understand what is important that individuals need to know, and code technical skills that definitely helped me complete this project.
In the project, I wished I had access to information regarding emissions by state. I would liked to compare how New England emissions compare to a similar region in population like Los Angeles or Maryland and Virginia to see if New England is leading emissions reduction in the United States. I could understand what are we doing right to reduce emissions or what should be do different if we are in the higher percentile.
Another idea, I wish that had a higher amount of information available is additional greenhouse gas emissions from other industries like agriculture and utilities to compare to the transportation industry. A wide business view on opinions and plans to reduce climate change would be interesting to learn about and analyze. Continuing the project to a new phase, I would add compare emissions in the transportation industry to the agriculture and the utility industries which have produced mass amounts of gases. The data would look to prove correlation if business in America is becoming more green or if an illusion is being shown and no meaningful change has happened.
Conclusion
Our research question states “How much emissions per ton are emitted annually and how this dilemma is changing since the push for clean energy beginning in 2015?”.
The analysis states that approximately 5,086,377,127 tons of emissions are emitted annually from 2015 to 2021. Over the last two years, this mean has been higher, meaning we have been decreasing our emissions average which shows positive change. The lower amount of emissions are the COVID-19 pandemic in 2020, the Electric implementation programs in Transportation, less commuting to the office are important factors in the reduction. People are seeing climate change as a significant threat and realize action must be taken to protect the future. This mindset encourages government to create rebates for clean transportation, investment for infrastructure and eco-friendly legislation.
The road transportation sector has been responsible for the majority of emissions from 2015 to 2021, with the majority of emissions being related to Carbon Dioxide. This information debunks our theory stating a significant amount of emissions are caused by daily tasks essential to life, getting groceries, commuting to work, seeing friends and family. Vehicles with engines are the vast majority of the United States car market due to lower initial price, quicker time to refuel, lack of infrastructure for vehicles powered with electric, hydrogen in most of the country. Work vehicles like trucks and construction equipment do not have the capability to use batteries as main sources of power as the technology is not available. The increase of non internal combustion vehicles becoming more accessible will reduce the amount of carbon dioxide being emitted in the road transportation industry and hopefully will extend to other transportation sectors like aviation.
As a result, we have determined that emissions have increased since 2020 due to the COVID-19 pandemic but less than pre-pandemic levels, projected to decrease in the coming years. The road transportation sector and carbon dioxide related gases are most responsible of emissions from our research. Using this information, we can begin to build a positive future to change the transportation industry to make a better tomorrow.
Bibliography
Climate TRACE, https://climatetrace.org/downloads. < Raw Data >
Singer-Vine, Jeremy. “Big Emitters.” Data Is Plural, 16 Nov. 2022, https://www.data-is-plural.com/archive/2022-11-16-edition/. < Where Data was Obtained >
“On Road.” Climate TRACE, 9 Nov. 2022, https://climatetrace.org/public/upload/files/62f50cfb415f6.pdf?v=1667641844. < How Transportation Data was Gathered >
Federal funding programs (no date) U.S. Department of Transportation. Available at: https://www.transportation.gov/rural/ev/toolkit/ev-infrastructure-funding-and-financing/federal-funding-programs (Accessed: December 17, 2022). > Federal EV Programs >
“Overview of Greenhouse Gases.” EPA, Environmental Protection Agency, 16 May 2022, https://www.epa.gov/ghgemissions/overview-greenhouse-gases. < Analysis of Gases >
Rabo, Olga. “What Is CO2E and How Is It Calculated?” Full Impact View of Your Investments - Cooler Future, 18 Nov. 2020, https://www.coolerfuture.com/blog/co2e. < CO2e Defined >